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Abstract:  A  program  to  solve  a  real  cubic  equation  efficiently  and  as 

accurately  as  the  data  deserve  is  not  yet  an  entirely  cut-and-dried  affair. 
An  iterative  method  is  the  best  found  so  far.  This  method  plus  same  other 
issues,  like  accuracy,  scaling,  preconditioning  and  testing,  are 
discussed  in  these  nates  in  enough  detail  to  convey  an  impression  of  what 
Numerical  Analysis  is  about. 


1.  Introduction: 

Closed-form  formulas  for  solving  the  real  cubic  equation 

Ax’  +  Bx*  +  Cx  +  D  =  0 

in  terms  of  its  coefficients  A,  B,  C,  D  were  discovered  in  the 
sixteenth  century  by  Italian  mathematicians,  but  their  triumph 
turned  into  disappointment  when  they  discovered  an  irreducible 
case :  the  real  cubic  with  three  irrational  real  roots.  This  case 

entails  unavoidably  the  computation  of  tri gonometr i c  functions  and 
their  inverses  during  the  evaluation  of  cube  roots  of  a  complex 
number.  Nowadays  tri gonometr i c  functions  and  complex  numbers  seem 
unobjectionable  in  a  procedure  that  solves  a  cubic,  .  so  they  have 
been  used  freely  in  a  modern  version  of  the  Italians'  formulas 
presented  below  in  §2  of  these  notes.  Alas,  the  modern  formula 
is  disappointing  too,  because  it  is  potentially  unstable  in  the 
face  of  roundoff.  Indeed,  coefficients  abound  for  which  some  of 
the  roots  computed  from  the  formula  are  quite  incorrect:  several 
instances  appear  among  the  examples  presented  in  §10  . 

Whether  a  slight  modification  could  protect  the  Italians'  formulas 
from  the  worst  effects  of  roundoff  remains  an  open  question.  The 
simplest  stable  version  of  those  formulas  I  know  is  tantamount  to 
evaluating  them  twice,  as  is  mentioned  near  the  end  of  §2  .  Two 
evaluations  take  long  enough  to  make  plausible  the  possibility 
that  another  approach  might  be  faster. 

Newton  pioneered  another  approach  when  he  first  used  the  iteration 
that  now  bears  his  name  to  solve  a  cubic.  Computers  can  tallow 
his  approach  provided  certain  details  like  where  to  start  and  when 
to  stop  are  mechanized.  Those  details  are  the  subject  of  §3  ,  a 
long  discussion  that  culminates  in  a  brief  but  entirely  automatic 
procedure  presented  as  a  program  QBC  in  §4  .  That  discussion 
provides  merely  a  motive  for  the  program,  not  a  proof  of  its 
correctness.  A  thorough  proof  would  be  far  too  lengthy  to  include 
in  these  notes.  Instead,  the  issues  that  such  a  proof  would  have  * 
to  address  will  be  explored  and  its  conclusions  summarized.  I 


1  >  v.  ■  ,  IV'I  m2  f 


!  ■  >  *  cJ  ti'.'v  ft*  | 

... -ii,  i^Mnnr- 


et  f/  /? 


Cubi c 1 


WORK  IN  PROGRESS 


Nov .  10,  1 9S6 


] 


The  most  difficult  issue  is  inaccuracy  caused  by  roundo-f-f.  Error 
analysis  proves  that  every  root  computed  by  QBC  is  no  more  in 
error  than  if  it  had  been  computed  exactly  -from  a  cubic  whose 
coef  f  i  ci  ents  differ  from  those  given  each  by  a  few  units  in  its 
last  digit  carried  by  the  computer's  floating-point  arithmetic. 
This  kind  of  Backward  Error  Analysis  was  first  published  in  the 
late  1950's  by  James  H.  Wilkinson.  It  suggests  that  inaccuracy 
introduced  by  the  process  of  solving  the  cubic  is  unlikely  to  be 
appreciably  worse  than  inaccuracies  previously  introduced  when  the 
coefficients  were  computed  and  rounded  off.  Therefore,  if  roots 
obtained  from  QBC  turn  out  too  inaccurate  for  some  ulterior 
purpose,  the  trouble  may  lie  not  so  much  with  QBC  as  with  the 
process  that  generated  the  coef f i ci ents.  Thus  does  backward  error 
analysis  exculpate  the  programmer  of  QBC.  And  it  does  more. 

The  uncertainty  contributed  to  the  computed  roots  by  roundoff  in 
QBC  can  now  be  assessed  by  analyzing  the  effects  upon  those  roots 
of  tiny  perturbat i ons  of  the  cubic’s  coefficients,  regardless  of 
the  internal  details  of  QBC  .  Even  without  those  details,  this 
analysis  is  tedious;  only  its  conclusions  are  summarized  in  15  . 
Computed  roots  turn  out  normally  to  be  accurate  in  all  but  thei r 
last  few  digits;  but  in  worst  cases,  when  all  three  roots  of  the 
cubic  almost  coincide,  the  computed  roots  can  lose  as  many  as  two 
thirds  of  the  figures  carried.  Examples  in  §5,  §7  and  §10  bear 
out  this  gloomy  prediction,  to  which  we  shall  return  later. 

Besides  being  too  long  to  include  in  these  notes,  the  proofs  of 
the  foregoing  claims  to  accuracy  are  at  least  as  vulnerable  to 
error  as  the  short  program  they  are  supposed  to  vindicate..  Such 
claims  deserve  credence  only  if  they  are.  supported  by  numerical 
experiments.  But  rounding  errors  committed  during  the  experiments 
can  confound  the  test  results  and  obscure  their  implications.  §6 
discusses  such  issues  and  offers  a  partial  remedy  in  the  form  of  a 
program  REVAL  that  combines  the  evaluation  of  a  cubic  polynomial 
with  the  simultaneous  calculation  of  a  rigorously  correct  bound 
for  the  effect  of  roundoff  upon  that  evaluation.  REVAL  is  based 
upon  prior  knowledge  of  a  bound  for  the  rounding  error  in  every 
floating-point  arithmetic  operation;  that  bound  is  characteristic 
of  the  computer  and  deducible  from  attributes  like  the  number  of 
significant  digits  it  carries.  REVAL  and  programs  like  it  permit 
the  error  in  a  computed  root,  regardless  of  its  provenance,  to 
be  overesti mated  with  ease  as  rigorously  as  one  likes  and  without 
excessive  pessimism  provided  the  root  lies  far  enough  away  from 
all  the  others.  Clustered  roots  are  a  little  harder  to  handle. 

The  previous  two  paragraphs  may  suggest  (and  it's  widely  believed) 
that  clustered  roots  of  a  cubic  cannot  be  calculated  accurately 
unless  arithmetic  is  performed  carrying  about  three  times  as  many 
significant  figures  as  will  be  assuredly  correct  in  the  computed 
roots.  That  is  untrue.  Also  untrue  is  another  widely  believed 
myth  about  numerical  computation,  namely  that  numerical  error  is 
caused  by  cancel  1  at i on .  In  ^act,  on  almost  all  modern  computers, 
no  new  error  is  generated  when  subtractive  cancellation  occurs; 
the  principal  exceptions  are  CRAYs,  CYBERs  and  UNIVACs.  On  IBM 
370’s,  DEC  VAX’s,  SUN's,  APPLE  Macintoshes  and  Hewl ett-F’ackarri 
calculators,  to  mention  just  a  few,  subtractive  cancellation  is 
exact.  This  fact  can  be  exploited  to  Precondition  a  cubic  with 
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clustered  roots,  transforming  it  into  a  new  cubic  with  relatively 
well  separated  roots  that  are  easy  to  calculate  and  transform  back 
into  fully  accurate  roots  of  the  original  equation.  A  simplified 
version  of  precondi ti on i ng ,  applicable  principally  to  cubics  with 
integer  coef f i ci ents ,  is  described  in  §7  with  examples  that  may 
suggest  how  the  process  would  work  in  general.  Thus  have  we 
confronted  two  myths  about  roundoff  and  cancelled  them  both. 

After  roundoff,  the  second  hazard  to  be  overcome  during  numerical 
computation  is  spurious  over/underf 1 ow,  an  event  that  occurs  when 
intermediate  results  would  be  so  huge  or  so  tiny  as  to  lie  outside 
the  range  of  numbers  normally  representabl e  in  the  computer  even 
though  the  desired  final  results  lie  within  range.  This  hazard  is 
encountered  only  rarely,  and  then  it  can  be  overcome  by  Seating, 
which  is  described  in  §8  . 

The  final  few  sections  of  these  notes  are  archival.  §9  presents 
a  collection  of  cubics  with  known  zeros  that  help  to  test  programs 
like  QBC  or  its  competitors.  §10  exhibits  selected  but  typical 
results  obtained  from  our  versions  of  the  Italians'  formula  and  of 
Newton's  iteration  (QBC)  programmed  into  an  HP-15C  handheld 
calculator.  The  program  for  QBC  is  supplied  in  §11,  and  the 
running  times  for  both  methods  are  compared  briefly  in  §12  . 


2-  A  Formula  in  "Closed  Form"  : 

A  cubic  polynomial  Ax3  +  Bx*  +  Cx  +  D  has  three  zeros  x  =  xt, 
x2,  x a  that  can  be  expressed  explicitly  in  terms  of  its  given 
coefficients  A,  B,  C,  D  in  many  ways.  The  formula  chosen  below 
is  one  of  the  better  ones,  and  has  been  arranged  in  the  form  of 
an  algorithm  that  can  easily  be  programmed  into  a  computer: 


A,  B,  C  and  D  are  given  real  numbers. 

If  A  =  0 

then  C  x3  i  =  <  I  B I  +  I  C|  +  I  D  |  ) /A  ;  .  .  .  oo  or  0/0  . 

p  :=  -C/2  ;  ...  Next  solve  Bx*  -  2px  +  D  =  0 

q  1=  /(p2  -  B  D)  ;  ...  possibly  an  imaginary  number 

if  q  is  Real  ...»  in  which  case  q  >  0  ,  ... 

then  £  r  :=  p  +  sign(p)q  ;  ...  =  p  +  q  ... 

if  r  *»  0 

then  £  ...  Zeros  are  0  or  oo  or  0/0  , 

:<  t  t  —  D/B  !  Xa  * =  —  x  i  ’■ 

else  £  X)  1=  D/r  ;  xa  '=  r/B  1  } 

else  {  Xi  1=  p/B  +  q/B  ;  xa‘=p/B-q/B  1  } 

else  £  b  .*=  - (B/A ) /3  ;  c  :  =  C/A  ;  d  D/A  ; 

.  .  .  Now  solve  xs  -  3bx*  +  cx  +  d  =  0  ... 

s  : =  3  b2  -  c  ; 
t  .‘=  (s-b2)b  -d  ; 

.  .  .  Now  :<  =  b  -  y  where  sy  -  y3  =  t  ... 
i  f  s  =  0  > 

then  £  yt  1=  -t1'3  ;  ...  the  real  cube  root 

ya  1=  yi  (-1  +  W 3)/2  j 

else  £  u  t=  /<4s/3>  ;  ...  possibly  imaginary. 

v  1=  arcsi n ( (3t/s) /u) /3  ;  ...  may  be  complex 

w  :=  (rt/3)  sign  (Re  <  v)  )  -  v  ;  ...  =  +rr/3  -  v  .. 

y«  u  sin(v)  ;  y3  :=  u  sin(w)  >  ; 

x,  :=  b  -  y,  ;  xa  1=  b  -  ya  ;  xs  :=  y,  >  v2  +  b 


m 


Codes 
d/or™ 
special 


Cubic  1 


WORK  IN  PROGRESS 


Nov.  10,  1986 


This  algorithm  was  programmed  into  an  HP-15C  calculator  without 
di  ff  i  cul  ty .  On  many  another  machine  programming  might  be  impeded 
by  the  absence  o-f  complex  sin  and  arcs  in  -from  its  library  of 
elementary  -functions.  Then  the  -following  -formulas  may  help: 

If  z*  >  1  then  arcsin(z)  =  (rr/2  -  i  arccosh  (  I  z  |  )  >  z/|z!  . 

If  z  is  real,  arcsin(tz)  =  t  arcsinn  (:)  , 

cos(«z)  =  cosh(z)  ,  and 
sin(iz)  =  isinh(z)  .  (  t  =  /- 1  ) 

With  the  aid  of  these  formulas  and  some  algebraic  manipulation, 
the  algorithm  can  be  freed  from  all  nontrivial  complex  arithmetic, 
but  only  at  the  cost  of  introducing  more  case  analysis.  In  place 
of  the  formulas  involving  complex  arcs  in  and  sin ,  there  will 
be  three  cases.  One  case  handles  s  <  0  .  If  s  >  0  (in  which 
case  u  >  0  too),  there  are  two  more  cases  according  to  where 
|3t/(su)|  lies  relative  to  1  .  But  multiplying  cases  can  only 
exacerbate  the  first  of  three  flaws  that  mar  the  algorithm: 

First,  the  algorithm  is  complicated,  and  therefore  vulnerable  to 
oversights.  Have  all  singularities  been  considered  and  handled 
correct 1 y? 

Second,  the  algorithm  is  vulnerable  to  over/underf low.  Even  when 
all  three  zeros  lie  well  within  range,  over/underf 1 ow  can  blight 
the  intermediate  quantities  q,  r,  s  and  t  .  The  natural  defense 
against  over/underf  low  is  scaling  ,  another  cornpl  ication. 

Third,  the  algorithm  is  vulnerable  to  roundoff,  particularly 
when  the  zeros  are  of  wildly  different  magnitudes;  then  the  zeros 
of  smaller  magnitude  tend  to  be  computed  relatively  inaccurately. 
(Examples  of  inaccuracy  can  be  found  at  the  end  of  these  notes. ) 
All  figures  can  be  lost  in  any  zero  whose  magnitude  is  smaller 
than  a  rounding  error  in  b  .  One  way  to  calculate  the  tiniest 
zero  more  accurately  is  to  obtain  it  as  the  reciprocal  of  the 
biggest  zero  of  A  +  Bz  +  Cza  +  Dz3  ,  which  is  tantamount  to 
running  the  foregoing  algorithm  a  second  time.  To  compute  the 
zero  of  middle  magnitude,  divide  -D/A  by  the  other  two  zeros. 

Another  way  to  improve  the  accuracy  of  a  zero  is  to  use  some  kind 
of  iteration  that  improves  approximate  zeros  by  exploiting  the 
cubic's  behavior  near  them;  a  short  step  past  this  thought  finds 
us  contemplating  whether  the  cubic  might  be  better  solved  by  an 
altogether  iterative  method  than  by  explicit  formulas.  Just  such 
an  iteration  is  the  next  topic  discussed  in  these  notes. 


3.  Newton's  Iteration: 

Given  the  real  cubic  polynomial  Q(x)  1  =  Ax3  +  Bx*  +  Cx  +  D  ,  we 
may  use  iteration  X„*,  ,*  =  X„  -  Q  <X„)  /Q'  ( X„ )  for  n  =  0,  2,  2,  ... 
to  find  a  real  zero  of  Q(x)  provided  we  can  solve  four  problems: 
-  How  shall  Q(X)/Q'(X)  be  calculated  efficiently? 

Where  is  a  good  place  to  choose  the  starting  iterate  X0  ? 

When  should  the  iteration  be  stopped? 

Having  found  one  zero,  how  do  we  find  the  other  two? 
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The  -Following  scheme  computes  Q(X)  and  Q'(X)  at  the  cost  of 
4  multiplications  per  iteration: 

qo  AX  ;  q,  :=  q0  +  B  ;  q2  1=  q(X  +  C  ; 

Q'(X>  1=  (q0+q»)X  +  q2  ;  Q(X)  1=  q2X  +  D  . 

Three  preliminary  divisions  of  all  the  coefficients  of  Q(x)  by 
A  could  subsequently  save  one  multiplication  per  iteration,  but 
doing  so  would  exacerbate  roundoff  and  raise  questions  about  over/ 
underflow,  questions  best  answered  by  scaling  all  coefficients  of 
G!(x)  in  advance  in  a  way  to  be  discussed  in  §8  below. 

Finding  a  good  starting  iterate  X0  is  a  balancing  act  among  many 
contending  considerations.  First  comes  the  numerical  stability  of 
the  deflation  process  by  which,  after  a  real  zero  has  been 
computed,  it  will  be  removed  from  the  cubic  to  yield  a  quadratic 
whose  zeros  are  the  remaining  two  zeros  of  the  cubic.  The  process 
of  deflation  is  numerically  stable  unless  the  zero  being  removed 
is  much  tinier  than  one  zero  of  the  quadratic  but  much  bigger  than 
the  other.  X0  can  be  chosen  to  avoid  that  unstable  situation. 

A  second  consideration  is  speed.  Newton's  iteration  converges 
very  quickly  if  started  close  enough  to  a  simple  zero,  but 
converges  very  slowly  to  a  multiple  zero.  Therefore,  X0  should 
ideally  be  extremely  close  to  a  triple  zero,  if  Q(x)  has  one, 
or  else  much  closer  to  a  simple  zero  than  to  a  double  zero  if 

Q(x)  has  both  of  those.  Here  is  a  way  to  choose  such  an  X0  : 

Assuming  AD  #  0  ,  let  b  1 =  -(B/A)/3  ;  r  t=  |Q(b)/A|,/s  >0  : 

and  s  :=  sign(Q(b)/A)  =  +1  .  If  Q'(b)/A  >_  0  then  XQ  b  -  sr 

else  X0  :=  b  -  1 .324718  s  max  Cr ,  V(~Q' ( b)/A)>  .  Why  does  this 
choice  work?  The  next  paragraph  will  explain.  To  better  follow 
its  argument,  read  it  repeatedly  with  reference  to  the  graphs  of , 
say,  !<5  +  ?x  +  2  for  9  =  -9,  -3,  -1,  0,  1  and  3  superposed 
upon  each  other  to  show  how  its  leftmost  real  zero  increases  with 
9  .  That  leftmost  zero  is  the  goal  of  the  iteration. 

Why  start  iterating  at  X©  ?  Observe  that  Q"(b)  =  0  ;  therefore 
x  =  b  at  the  inflexion  on  the  graph  of  Q(x)  ,  and  furthermore 
G<b-y>  =  Qfb)  -  Q'(b)y  -  Ay5  .  If  Q'(b)/A  >  0  then  this  cubic 
is  strictly  rnonotonic  with  just  one  real  zero  y  that  must  lie 
between  y  =  0  and  y  *  sr  ;  otherwise  the  real  zero  y  farthest 
from  0  lies  beyond  y  =  sr  and  beyond  y  *  s/(-Q’  (b ) /A)  too, 
but  not  beyond  both  xsr  and  (b ) /A)  ,  where  x  is  the 

real  root  X  =  1.32471  79572  44746...  of  X3  =  X  +  1  .  Since  the 
desired  real  zero  X  lies  between  the  starting  iterate  X0  and 
the  inflexion  point  b  ,  and  the  cubic  is  monotone  between  X  and 
X0,  Newton's  iteration  converges  rnonotoni cal  1  y  and  rapidlv  to  the 
desired  real  zero.  In  the  special  case  that  X0  =  b  no  further 
iteration  will  occur  because  then  b  is  the  cubic's  triple  zero. 

When  should  the  iteration  XR*t  1=  X„  -  Q  ( X„ )  /Q'  ( X„ )  be  stooped'7’ 
Except  when  X0  =  b  ,  we  would  expect  sign(X„*i  -  X„ =  s  -fat- 
all  n  ;  but  that  expectation  cannot  persist  indefinitely  in  the 
face  of  roundoff.  Ultimately  roundoff  must  cause  Xn.t  -  X„  to 
vanish  or  take  the  wrong  sign,  or  cause  Q’(X„)  to  vanish;  in 
either  case  we  shall  set  X  1=  X„  and  accept  it  as  a  real  zero  of 
the  cubic.  Since  any  iteration  could  take  too  long  to  home  in  to 
X  =5  0  ,  which  occurs  if  D  =  0  ,  that  case  is  segregated.  And 
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the  quotient  Q/Q'  must  be  replaced  by  <Q/Q‘ )  /  1 . 000.  .  .  001  to 

overcompensate  tor  roundoff  that  could  otherwise  carry  X„  too 
•far  beyond  its  goal.  When  X  is  extremely  tiny,  that  extra 
division  prevents  X„  -from  jumping  over  X  to  0  ,  as  otherwise 
it  would  in  one  o-f  the  examples  in  §10  .  Roundo-f-f  can  cause  yet 
another  kind  o-f  overshoot  when  the  cubic's  three  zeros  are  closely 
clustered;  X„  can  -fall  between  two  zeros.  We  avoid  the  worst 
e-f-fects  o-f  this  overshoot  by  accepting  X  =  X„  instead  o-f  X„*,  . 

Our  policies  for  handling  roundoff  and  stopping  the  iteration  are 
not  the  only  possibilities,  but  they  are  among  the  simplest. 

With  one  real  zero  X  in  hand,  the  next  task  is  deflation  to 
obtain  the  quadratic  Axa  +  B,x  +  Ca  whose  zeros  are  the  two 
remaining  zeros  of  the  cubic.  Here  are  the  deflation  formulas- 
If  I  Xs  |  >  i  D/A  i  then  C  Ca  :  =  -D/X  ;  B,  .‘=  <Ca  -  C> /X  1 

else  C  B,  AX  +  B  ;  Ca  B,X  +  C  1 

...  <  recall  qt  and  q2  above  )  ... 

One  formula  for  Ca  comes  from  the  product  of  the  cubic's  zeros, 
-D/A  =  X  C2/ A  .  The  choice  for  Bt  was  derived  from  an  error- 
analysis  that  looked  at  the  sum  of  the  zeros,  -B/A  =  X  ~  B, /A  , 
and  at  the  sum  of  their  reciprocals,  -C/D  =  1/X  -  B, /C2  ,  to 
find  out  which  is  least  perturbed  by  the  error  in  X  .  Of  course, 
different  formulas  have  to  be  used  when  A  =  0  or  D  =  0  . 

Finally,  formulas  for  solving  a  quadratic  equation  are  taken  from 
the  algorithm  presented  earlier. 


4.  Iterative  Algorithm  QBC  : 

The  following  algorithm,  arranged  to  facilitate  programming,  is 
complete  except  for  scaling  precautions  against  over/underf .low. 

It  is  broken  into  subprocedures  that  make  it  easier  to  understand. 


Real  Function  DISC(a,  b,  c)  I*  ba  -  a c  ; 

...  Later,  during  the  discussion  of  Precondi t i oni ng  in  §7  . 
...  another  version  of  DISC  will  be  presented  that  is  more 
...  accurate  when  a,  b,  c  are  all  integers  and  not  too  biq. 
End  DISC  . 


Procedure  QDRTC (  A,  B,  C,  Xt  +  <Y»,  Xa+tYa  ): 

...  Given  real  coefficients  A,  B,  C  ,  this  procedure  delivers 
...  the  two  zeros  Xj  +  eYj  of  the  quadratic  Ax 2  +  E-ix  +  C  . 
b  : =  -B/2  ;  q  :=  DISC<A,  b,  C)  ; 

If  q  <  0 

then  C  X,  :=  b/A  ;  Xa  .*=  X,  ; 

Y,  ,*=  /(-qJ/A  ;  Ya  .*=  -Y,  J 
else  -C  Y,  :=  0  ;  Ya  :=  0  ; 

r  :=  b  +  sign(b)/q  ;  ...  =  b  +  /q  . 

If  r  =  0 

then  -C  X,  :=  C/A  ;  Xa  ,'=  -X,  J 

else  -t  X,  .*=  C/r  ;  Xa  :=  r/A  3-  >  ; 

Return  ;  End  QDRTC  . 
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Procedure  EVAL  (  X,  A,  B,  C,  D,  Q,  Q' ,  B,  ,  Ct  ): 

...  Given  real  X  and  real  coefficients  A,  B,  C,  D  of  the 
...  cubic  Q(x)  =  Ax3  +  Bxa  +  Cx  +  D  ,  this  procedure  computes 
Q  :=  Q(X)  ,  Q'  :=  Q'  (X)  ,  B,  1=  AX  +  B  and  C2  :=  B,X  +  C  . 

q0  ‘  —  AX  {  Bj  • =  qo  -+■  B  5  C2  B,  X  +  C  ; 

Q'  :=  (q©+B, )  X  +  Ca  ;  Q  :=  C2X  +  D  ; 

Return  ;  End  EVAL  . 


Procedure  QBC  (  A,  B,  C,  D,  X,  X,  +  tY,  ,  Xa+iYa  ): 

...  Given  real  coefficients  A,  B,  C,  D  of  the  cubic 

...  Ax3  +  Bxa  +  Cx  +  D  ,  this  procedure  computes  a  real  zero  X 

...  and  two  complex  zeros  Xj  +  iYj  of  the  cubic. 

If  A  =  0  then  C  X  1=  oo  ;  A  :=  B  ;  b,  ;=  C  ;  c2  ;=  D  : 

go  to  fin  3  ; 

If  D  =  0  then  C  X  1=  0  ;  b,  ,*=  B  c2  :=  C  ; 

go  to  fin  3  ; 

X  :=  - (B/A) /3  ;  call  EVAL(X,  A,B,C,D,  q,  q'  ,  b,  ,  c2,  ; 
t  1=  q/A  ;  r  1=  3/ltl  ;  s  !*  sign(t)  ;  ...  =  +1  . 
t  J“  — q*/A  j  if  t  >  0  znen  r  1=  1 . 32471 8  max  (r ,  yt)  i 
x«  :■  X  -  or  5  if  x o  =  X  then  go  to  fin  ; 

Do  C  X  1=  x o  ;  call  EVAL(X,  A,B,C,D,  q,  q' ,  b,  ,  c2)  ; 

if  q'  =  0  then  x0  1=  X 

else  x©  l —  X  —  <q/q' ) / 1 . 000.  . 001  3 
until  sxo  <  sX  ;  ...  stop  when  x©  6  X  . 

If  1  A | Xa  >  | D/X 1 

then  {  c2  1=  -D/X  ;  b,  :=  (c2-C>/X  3; 
fin-.  call  QDRTC  (  A,  bi  ,  c2,  Xi  +  tYt,  X2+  iY*  )  ; 

Return  ;  End  QBC  . 


S.  Accuracy: 

A  rigorous  assessment  of  the  effects  of  roundoff  upon  QBC  would 
be  too  complicated  to  include  in  these  notes,  but  the  conclusions 
from  such  an  assessment  will  be  stated  here,  followed  later  in 
§6  ("Testing  Considerations")  and  §7  ( "Preconditioning" )  by 

some  suggestions  about  what  can  be  done  about  those  effects. 

Provided  over /underflow  does  not  intrude,  QBC's  combination  of 
iteration  and  deflation  always  produces  results  scarcely  worse 
than  if  the  cubic's  coefficients  had  each  been  perturbed  by  a  few 
rounding  errors  at  the  start.  In  the  worst  case,  when  the  three 
zeros  of  the  cubic  are  all  relatively  nearly  coincident,  they  may¬ 
be  correct  to  as  few  as  a  third  of  the  figures  carried;  such  a 
loss  of  accuracy  also  may  afflict  the  closed  form  formula  in  that 
case.  The  phenomenon  is  illustrated  by  the  following  example: 

Consider  the  cubic  x3  -  3xa  +  3x  -  (J-s)  ,  where  1-s  is  the 

number  next  less  than  1  representabl e  in  the  floating-point 
format  used  during  computation.  The  zeros  of  this  cubic  are  the 
three  values  of  1  -  s,/3  .  For  instance,  if  12  sig.  dec.  are 
carried  during  computation,  1-s  =  0.9999  9999  9999  and  the  real 
zero  1  -  s,/3  =  0.9999  .  Changing  the  coefficient  1-s  in  its 
12**’  sig.  dec.  to  1  changes  all  three  zeros  in  the  4th  to  1  . 

In  other  examples,  with  two  nearly  coincident  zeros  relatively 
far  from  the  third,  about  half  the  figures  carried  can  be  lost 
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regardless  of  how  the  cubic  is  solved.  But  QBC  never  loses  all 
the  figures  carried,  as  the  closed-form  formulas  can.  Examples 
to  show  what  can  happen  will  be  presented  later.  Here  is  a 
summary  of  the  conclusions  that  can  be  drawn  from  error  analysis: 


Each  zero  Z  computed  by  QBC's  combination  of  iteration  and 
deflation  is  accurate  almost  to  whichever  is  the  largest  of  .. 

-  as  many  figures  as  were  carried  less  the  sum  of  the 

figures  to  which  the  other  two  zeros  agree  with 

-  half  of  the  excess  of  the  number  of  figures  carried 

number  of  figures  of  agreement  between  Z  ,  one 
of  coincident  or  nearly  coincident  zeros,  and  a 
relatively  different  from  the  pair,  or  ... 
a  third  of  the  figures  carried,  if  all  three  zeros 
coincident  or  nearly  coincident  with  Z  . 


numbers  o*: 
7  ,  or 
over  the 
of  a  pa i r 
third  zero 

are 


No  way  is  known  to  calculate  the  zeros  of  a  cubic  more  accurately 
than  if  its  coefficients  had  first  been  perturbed  by  roundoff, 
unless  part  of  the  calculation  is  performed  exactly  —  with  no 
roundoff  at  all.  That  exact  calculation  is  part  of  a  process 
called  “F’recondi  t  i  oni  ng  "  ,  which  will  be  described  later  in  ?<7 


6.  Testing  Considerations: 

The  obvious  way  to  test  QBC  is  to  supply  it  with  arguments  for 
which  accurate  results  have  been  calculated  by  some  other  method, 
and  then  compare.  On  reflection,  this  test  procedure  is  not  so 
obvious.  What  other  method  will  give  accurate  results?  Cubic; 
can  be  constructed  with  small  integer  coefficients  and  at  least 
one  zero  expressible  as  a  ratio  of  small  integers;  but  small 
integer  input  data  might  fail  to  stimulate  typical  rounding 
errors.  And  if  results  differ  from  what  might  ideally  have  been 
expected,  how  does  one  decide  whether  the  differences  are 
tolerable  consequences  of  unavoidable  rounding  errors,  or 
symptoms  of  a  defect  in  the  program  that  must  be  repaired? 

A  simple  procedure  that  seems  at  first  free  from  the  dilemmas  is 
to  reconstruct  the  cubic  from  its  computed  zeros  X,  Y,  Z  by 
expanding  A(x-X) (x-Y) (x-Z)  in  powers  of  x  .  If  the  cubic  so 
reconstructed  matches  the  given  cubic  well  enough,  the  program 
that  computed  the  zeros  cannot  be  too  wrong.  But  how  well  is 
"well  enough"  ?  Presumably  the  reconstruct i on  need  match  no  more 
accurately  than  if  X,  Y  and  Z  were  correct  zeros  each  rounded 
off  to  working  precision  (though  actually  they  might  be  far  less 
accurate  than  that);  and  the  rounding  errors  that  accrue  during 
the  reconstruct i an  process  have  to  be  allowed  for  too.  It's  not 
so  simple  after  all. 

Program  testing  is  fraught  with  anxiety  unless  one  can  estimate 
mathematical ly  how  big  the  errors  should  not  be.  Such  an  estimate 
of  uncertainty  can  be  very  difficult;  I  would  much  rather  have  to 
write  a  program  than  have  to  analyze  its  errors  or  test  it. 

The  program  REVAL  below  computes  a  rigorous  and  fairly  sharp 
bound  d,  for  the  contribution  of  roundoff  to  the  computed  value 
Q  of  a  cubic  Q(z)  1=  Az3  +  Bz2  +  Cz  +  D  at  the  same  time  as  it 
computes  Q  .  REVAL  requires  knowledge  about  bounds  for  every 
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rounding  error  committed  by  the  computer  in  response  to  statement’s 
like  "  s  1=  x+y  ;  d  x-y  ;  p  ;  =  x *y  ;  "  in  a  program. 

These  assignments  store  in  the  computer's  memory  values  s,  d  and 
p  slightly  different  -from  the  ideal  sum,  difference  and  product 
desired.  Almost  every  modern  computer's  arithmetic  has  its  own 
character i st i c  tiny  constants  z  and  S  that  satisfy 

Is  -  (x+y)  I  <  8  Is  I  ,  Id  -  (x—y)  I  <  $  Id  I  ,  Ip  -  x*y!  <  z  !x*y! 
for  all  non-pathol ogi cal  values  ;<  and  y  representabl e  in  the 
computer  (  ignore  oo  and  over/underf low  for  now  )  .  Ideally 

S  =  z  =  (  1.000... 001  -  1.000... 000  ) /2 

but  some  computer  arithmetics  are  somewhat  worse,  and  many  suffer 
larger  values  of  z  for  complex  multiplication  than  for  real. 

To  apply  the  foregoing  inequalities  to  the  error  analysis  of  any 
program  that  computes  Q  ,  first  decompose  the  program  into  a 
sequence  of  simple  assignments  like 

q0  1  =  A*z  ;  qt  q0  +  8  ;  .  .  .  ;  qa  1=  qj#z  ;  Q  1=  q3  +  D  . 

Then  replace  them  by  the  inequalities  they  actually  satisfy; 

Iqo  -  Az |  <  s I Az |  ;  I q i  —  <q0+B)  I  <  8  |q,  !  ;  ... 

...  ;  !  a  ;  -  qaz  I  <  £  I  qaz  I  ;  I 0  -  (q3  *■  D)  |  <  $  I 0  I  . 

These  several  inequalities  boil  down  to  one  of  the  form 

I  0  -  (Az3  +  Bz*  +  Cz  +  D)  I  <  ^ 

wherein  A  is  expressed  in  terms  of  s  and  S  and  various 
computed  values.  Hence,  A  can  be  computed  too  thus; 

Procedure  REVAL (  Z,  A,  B,  C,  D,  Q ,  &>: 

...  Given  real  coefficients  A,  B,  C,  D  ,  this  procedure  yields 

...  an  approx i mati on  Q  to  G(Z>  1=  AZ3 +  BZ2  +  CZ  +  D  and  a 

...  bound  A  >  |<3-G(Z)|  ,  which  would  be  zero  if  no  roundoff 

...  occurred.  Instead,  constants  8  and  z  that  reflect  the 

...  computer's  roundoff  must  be  put  into  the  program.  A  bigger 

...  s  may  be  needed  for  complex  arithmetic  than  for  real, 
e  :=  lAU/Cs+S)  ; 

q,  :=  fl  Z  +  B  ;  e  |Z|e+lq,l  ; 

qa  J*q.Z  +  C  ;  e  :=  IZI  e  +  lq»l  ; 

Q  :=»  qaZ  >  D  ;  A  .*  =  ( s+5  )  I  Z  I  e  +  I Q I  8  ; 

Return  ;  End  REVAL  . 

How  might  REVAL  be  tested?  After  proving  that  no  computed  value 
of  Q  can  differ  from  an  accurate  evaluation  of  Q(Z)  by  more  in 
magnitude  than  A  ,  we  have  to  show  also  that  the  error  bound  A 
is  not  so  pessimistic  as  to  be  useless.  Among  large  collections 
of  trial  data,  A  should  sometimes  barely  exceed  I Q  -  GHZ)  I  ; 
the  only  way  to  verify  this  is  to  compute  Q(Z)  more  accurately. 

This  procedure  REVAL  can  serve  to  test  the  quality  of  Z  as  an 
approximate  zero  of  the  cubic;  compute  the  quotient  101/ A  .  A 
quotient  no  bigger  than  2  ,  say,  indicates  that  no  substantial 
improvement  in  the  accuracy  of  Z  is  likely  to  be  achieved  unless 
arithmetic  is  carried  out  to  higher  precision.  Of  course,  if  you. 
believe  QBC  works  correctly  you  must  believe  that  I Q I / A  will 
be  fairly  small  at  every  computed  zero,  in  which  case  you'll  not 
bother  to  compute  that  quotient.  But  REVAL  has  another  use. 
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A  bound  upon  the  error  in  any  approximate  zero  Z  can  be  derived 
■from  REVAL's  bound  A  >  I Q  -  GHZ)  |  ,  among  other  things,  no 

matter  what  the  provenance  of  Z  .  I-f  Z  is  accurate  enough,  one 
step  o-f  Newton's  iteration  -from  Z  to  Z  -  Q(Z)/Q'(Z)  nearly 
doubles  its  number  o-f  correct  digits,  in  which  case  Q(Z)/Q'(Z> 
must  approximate  the  error  in  Z  -fairly  closely.  That  quotient 
is  never  much  smaller  than  the  error  because,  in  general ,  GHz) 
must  have  a  (possibly  complex)  zero  z  no  farther  from  Z  than 
3|Q(Z)/Q'(Z)  I  ,  according  to  a  theorem  of  Laguerre.  REVAL's 
f  0 1 +A  overestimates  (GHZ)  I  ;  and  an  estimate  of  Q‘tZ)  comes 
either  from  AZ*  +  q,Z  +  qa  ,  as  in  EVAL,  or  from  A(Z-X)  <Z-V) 
where  X  and  Y  approximate  the  other  two  zeros  of  the  cubic.  One 
way  or  another ,  ( 1 Q | +A)  / I Q'  ( Z )  I  provides  at  1  east  a  rough  bound 

for  the  error  in  Z  . 

A  rigorous  error  bound  derived  from  Laguerre's  theorem  requires 
a  rigorous  lower  bound  for  |Q'(Z)|  ,  which  could  be  obtained 

from  an  augmented  version  of  REVAL  that  accounted  for  roundoff's 
contribution  to  Q'(Z)  as  well  as  to  Q(Z)  .  Alternative! v,  if 
approximate  zeros  X,  Y,  Z  are  in  hand,  three  calls  to  REVAL. 
would  help  overestimate  the  right-hand  sides  of  the  inequalities 

I x - X  I  <  3  I Q (X)  | / I  A (X-Y)  (X-Z)  I  , 
ly-Yi  <  3IGHY)  !/|A(Y-Z)  (Y-X)  !  and 
Iz-ZI  <  3|Q(Z)  I / (A(Z — X)  <Z-Y>  !  , 

which  rigorously  bound  the  true  zeros  x,  y,  z  of  Q  unless  they 
are  clustered  so  closely  that  these  three  estimates  overlap.  But 
rigorous  bounds  differ  significantly  from  the  previous  paragraph's 
rough  bounds  only  when  zeros  are  clustered,  and  then  time  spent 
to  get  rigorous  but  probably  dismal  bounds  might  be  better  spent 
computing  more  accurate  zeros  with  the  aid  of  preconditioning. 


7.  Precondi tioning: 

Since  error  bounds  are  so  often  pessimistic,  one  might  suspect 
that  error  analysts  are  pessimists  too.  Actually,  error  analysts 
are  less  interested  in  over-estimating  error  than  in  diminishing 
it.  One  way  to  diminish  roundoff  error  is  preconditioning ,  a 
process  that  transforms  a  problem  hypersensitive  to  roundoff  into 
a  problem  that  is  similar  but  far  less  sensitive. 

The  simplest  illustration  of  the  process  concerns  a  quadratic 
equation  in  the  form 

ax*  -  2bx  +  c  =  0  , 

a  form  more  convenient  for  our  purpose  than  the  usual  form 
Ax*  +  Bx  +  C  =  0  from  which  we  get  the  desired  form  by  setting 
a  1  =  -2A,  b  B  and  c  ‘=  -2C  .  This  equation  is  hypersensitive 
to  rounding  errors  and  also  to  any  other  perturbations  of  its 
coefficients  just  when  its  roots  are  relatively  nearly  coincident, 
in  which  case  computed  roots  can  be  inaccurate  in  almost  half  the 
figures  carried.  For  instance,  when  a  =  100002  ,  b  =  t 00001 
and  c  =  100000  ,  the  true  roots  x  =  1  and  x  -  0.9999800004... 
differ  in  their  5tr>  digits  from  the  double  root  x  =  0.9999 900002 
computed  on  a  10-digit  calculator  using  the  familiar  formula 

x  =  (b  +  /<b*-ac))/a  ; 

but  the  computed  roots  are  just  what  would  have  been  obtained  in 
exact  arithmetic  had  the  coefficients  b  and  c  first  been  altered 
in  digits  beyond  their  10***  to  b  =  100001.00000  00004  and 
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c  =  t OOOOO. 0000/  00005  99996  OOOOS  .  Such  tiny  perturbations  are 
enough  to  cause  relatively  serious  errors  in  yr(b2-ac)  ,  errors 
avoidable  only  by  carrying  in  worst  cases  twice  as  many  sig.  dec. 
in  our  cornputati  ons  and  honoring  twice  as  many  sig.  dec.  in  the 
coefficients  as  we  wish  to  guarantee  correct  in  computed  roots. 

When  are  the  coefficients  likely  to  be  known  so  accurately7  Most 
likely  when  they  are  known  exactly,  and  then  most  likely  when 
they  are  integers.  Therefore,  let  us  consider  the  case  when  a  , 
b  and  c  are  all  integers  and,  to  simplify  the  exposition,  let 
us  assume  that  they  are  representabl e  exactly  in  floating-point 
with  a  digit  to  spare.  This  means  integers  with  no  more  than  9 
digits  on  a  10-digit  calculator,  no  more  than  23  bits  on  a 
computer  that  performs  binary  floating-point  with  24  sig.  bits. 

If  the  coefficients  were  rather  smaller  than  that,  so  small  that 
the  products  ba  and  ac  were  both  representabl e  exactly,  then 
the  discriminant  q  1=  b2-ac  would  be  fully  accurate  enough  to 
produce  entirely  satisfactory  results  from  a  program  like  QDRTC 
above.  That  state  of  affairs  is  the  goal  of  the  precondi tioni ng 
function  DISC  presented  below.  Without  changing  q  =  b2-ac  , 
it  successively  diminishes  the  integers  a,  b,  c  until  either  ac 
is  negative  or  it  differs  enough  from  b2  that  DISC  1=  b2  -  ac 
can  be  computed  contaminated  only  relatively  slightly  by  roundoff. 

Real  Function  DISC<a,  b,  c>: 

...  Given  integers  a  ,  b  ,  c  all  small  enough  to  fit  exactly 
...  into  floating-point  with  at  least  a  digit  to  spare,  return 
...  DISC  =  b2  -  a  c  with  roundoff  confined  to  its  last  sig.  dec. 

If  a  c  >  0  then 

<  a  :=  lal  ;  c  :=  |c  I  ; 

loop:  if  a  <  c  then  swap (a,  c)  ;  ...  now  0  <  c  <  a  , 

n  :=»  integer  nearest  b/c  ;  ...  In-b/c!  <  1/2  . 

if  n  #  0  then  ...  (else  b*<c*/4£ac/4) 

C  oc  1=  a  -  n  b  ;  ...  exact  if  «  >.  -a 

if  <x  >  -a  then  ...  (else  2b2  >  3ac  ) 

C  b  1=  b  -  n  c  ;  ...  lb  I  £  c/2 

a  1  =  «  -  n  b  ; 

if  a  >  0  then  go  to  loop  }  >  >  ; 

Return  DISC  1=  b2  -  a  c  ;  End  DISC  . 

After  substituting  this  precondi ti oni ng  function  DISC  for  the 
function  DISC  that  accompanies  the  procedure  QDRTC  above,  we 
can  compute  the  desired  roots  Xj  +  eYj  of  our  quadratic  to  nearly 
full  accuracy  by  calling  QDRTC (  a,  -2b,  c,  X,  +  eY,,  Xa+tYa  )  . 

When  applied  to  our  example  above,  DISC (100002,  100001,  100000) 
finds  n  =»  3  and  reduces  a  ,  b  ,  c  successively  to 
«  =  100002  -  100001  =  1  ,  b  ■  100001  -  100000  =  1  ,  a  =  1  -  1  =  0 

and  then  returns  DISC  *  1  correctly  having  exploited  massive 
cancellation  without  error.  Here  are  some  more  examples: 
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a 

3234424085 

3234413351 

895275144! 

8952751442 

5309162499 

5309162499 

5309162499 


b 

1160927837 

1160928203 

1557625 

1557625 

2301700899 

2301700899 

2301700899 


c 

416690270 

416690636 

271 

271 

997864924 

997864923 

997864922 


crude  DISC 
398000000000 
-89 000000000 
0 
0 

-6000000000 

0 

5000000000 


refined  DISC 
397448345600 
-89060331630 
114 
-157 

-5110876875 

198285624 

5507448123 


true  ba-ac 
3974483456 1 9 
-89060331627 
1  1  4 
-157 

-5110876375 

198285624 

5507448123 


All  coluHins  but  the  last  were  obtained  fro*  versions  of  DISC  programmed  into 
the  HP-15C,  a  ten-figure  calculator.  The  last  column  comes  from  the  WP-71B. 
a  twelve-figure  machine,  using  a  faster  version  of  DISC  that  exploits  the 
INEXACT  flag  provided  by  IEEE  standard  p854,  to  which  the  HP-71B  conforms: 
DEF  FNq(a,b,c)  !  ...  q  :=  bA2  -  a*c  more  accurately.  (in  8ASIC ) 

iO  =  FLAG(INX,0)  !  ...  saves  and  resets  INEXACT  flag. 

'loop':  bO  =  b*b  8  aO  =  a»c  !  ...  Are  they  exact? 

IF  FLA6 ( INX , i 0 ) =0  OR  a0<=0  THEN  SOTO  'fin' 

IF  ABS(c) >ABS < a)  THEN  aO=a  0  a=c  0  c*aO  1  ...  swap(a,c) 
bO  *  RED (b  ,c)  0  n  =  IROUND ( (b-bO) /c )  1  ...  RED  is  IEEE  rent 

il  *  FLAG ( INX , 0)  !  ...  resets  INEXACT  flag. 

aO  =  (a  -  n * b )  -  n*bO 

IF  FLAG ( I  NX ) =0  THEN  a  *  aO  8  b  »  bO  0  GOTO  'loop' 

'fin':  FNq  =  b*b  -  a*c  8  END  DEF 


An  idea  similar  to  that  in  DISC  ,  but  applied  very  differently, 
serves  to  precondition  the  cubic  equation 

q(x)  I =  ax3  -  3bxa  +  3cx  -  d  =  O 
when  all  its  coefficients  except  perhaps  d  are  integers 
representable  exactly  in  floating-point  with  at  least  a  digit  or 
two  to  spare.  QBC  will  calculate  the  equation's  roots  but,  in 
the  light  of  error  analyses  mentioned  above,  we  must  expect  the 
calculated  roots  to  suffer  badly  from  roundoff  whenever  they  are 
clustered.  Fortunatel y  that  possibility,  clustered  roots,  can 
be  recognized  easily  without  any  call  upon  QBC  ;  if  all  three 
roots  are  nearly  coincident  then  all  three  quotients  b/a,  c/b  and 
d/c  must  be  nearly  coincident  too.  In  fact,  a  little  algebraic 
manipulation  suffices  to  prove  that  the  quotients  match  to  beyond 
twice  as  many  sig.  digits  as  are  common  to  the  roots.  To  exploit 
this  phenomenon,  choose  x  to  approximate  all  three  quotients 
rounded  to  no  more  sig.  digits  than  are  left  unoccupied  by  the 
first  three  coefficients;  this  means  that  all  three  products  Xa, 
Xb  and  Xc  will  be  computed  exactly  in  floating-point  arithmetic. 
Next  replace  x  by  X+y  in  the  given  equation  to  get  a  new  cubic 

q(x+y)  =  ay3  -  3b'ya  +  3c*y  -  d*  =  0 

which  QBC  can  solve  for  roots  y  ,  whence  x  =  x+y  ,  much  more 
accurately  than  before.  New  coefficients  must  be  calculated  thus; 
d'  :=  d  -  Xc  ;  c'  :=  c  -  xb  ;  b'  :=  b  -  xa  ; 

d’  1=  d'  -  xc’  ;  c’  :=  c'  -  xb'  : 

d*  ,*=  d'  -  xc*  . 

Cancellation  will  occur  in  the  first  row  without  error;  and  if 
rounding  errors  do  occur  later  they  will  be  far  tinier  than  what 
QBC  would  likely  inflict  upon  the  original  coefficients.  When 
all  three  roots  x  are  extremely  close,  so  close  that  all  three 
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roots  y  must  be  relatively  nearly  coincident  too,  no  rounding 
errors  will  occur  during  the  calculation  of  the  new  coe-f-f  1  ci  ents 
b’,  c*  and  d*,  and  then  the  -foregoing  trans-f orrnati on  may  be 
repeated  advantageous! y  with  a  new  tinier  x  . 

When  two  roots  are  nearly  coincident  but  relatively  tar  +rorn  the 
third,  the  three  quotients  above  must  be  replaced  by  two  values 
(i/2)  (be  -  ad)/(b2  -  ac)  and  ±Y <  (c2  -  bd)/(b2  -  ac)  )  . 

They  can  be  shown  to  match  to  about  twice  as  many  sig.  digits  as 
are  in  agreement  between  the  two  nearly  coincident  roots:  and  >. 
must  approx i mate  those  two  values  rounded  to  at  most  half  as  many 
digits  as  are  left  unoccupied  by  the  first  three  coefficients,  so 
that  all  three  products  x2a,  X2b  and  Xc  will  be  computed  exactly 
in  floating-point  arithmetic.  Then  the  new  coefficients  and  the 
roots  x  =  x+y  may  be  calculated  as  above  except  when  d  turns 
out  to  be  small  compared  with  ax3  .  In  that  special  case,  the 
third  root  will  be  rather  smaller  than  the  two  that  are  nearly 
coincident,  so  it  rnay  well  be  computed  more  accurately  from  the 
original  coefficients  than  from  the  new  ones.  Moreover,  in  case 
d  is  small  and  not  an  integer,  the  formulas  for  d',  d*  and  d* 
should  be  changed  as  follows  for  better  accuracy  in  the  nearly 
coincident  roots  X+y  : 

D  1  =  integer  nearest  d  ;  5  1  =  d  -  D  ; 

d'  .*=  D  -  Xc  ;  d*  :=  d'  -  Xc'  ;  d-  ,*=  (d‘  -  Xc1)  +  £  . 

A  detailed  explanation  to  justify  the  foregoing  procedures  is  too 
complicated  to  include  in  these  notes.  Instead,  a  few  examples 
will  illustrate  the  schemes'  efficacy. 


These  examples  were  all  worked  out  on  an  HP-15C  calculator,  which  carries  10 
sig.  dec.  First  the  zeros  x  of  each  given  cubic  q(x)  were  obtained  from  a 
prograe  like  QBC  ,  listed  at  the  end  Df  these  notes,  to  see  haw  inaccurately 
it  coaputes  clustered  zeros.  Then  quotients  of  coefficients  were  examined  to 
deteraine  a  choice  of  x  froii  which  new  coefficients  of  q(x+y)  were  derived. 
The  interaediate  results  of  this  computation  are  displayed  below  with  strings 
of  leading  "o's"  to  denote  digits  that  cancelled  off.  Then  QBC  was  rerun 
to  comoute  the  zeros  y  of  q(x+y)  ,  fro#  which  were  obtained  improved  zeros 
x  =  X+y  whose  correctness  was  verified  on  an  HP-71B  carrying  12  sig. dec. 

Q(x>  *  638x3  -  1 90! 25x 2  +  18311811x  -  387898164 


QBC:  x  » 

96.297  ,  96.34/ 

,  96.305 

b/a  »  96. 

31458967  c/b  * 

96.31458777  d/c  = 

96.31458582 

a  a  658 

b  =  63375 

c  =  6103937  d 

=  587898164 

b'  =  oooo9.6 

c'  =  oooo924.5  d' 

=  oooo89030.9 

c*  =  ooo.o2  d* 

a  ooool . 55 

d* 

a  -0.376 

Q (X+y )  = 

658y3  -  28. 8y2 

+  0 . Q6y  +  0.376 

QBC:  X+y 

=  96.22963935, 

96.35706483  +  O . 06974975204 z 
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Q < x )  3  22121 1  lx3  -  73449x 2  +  81 3x  -  3 

SBC:  x  =  0,01 109692665  ,  0.0110530996/  +  0.000200902 9481  i 

b/a  3  0.0110677  c/b  3  0.0110689  d/c~3  0.0110701  x  :=  0.0111 

a  3  2212111  b  =  24483  c  *  271  d  =  3 

b'  3  -aaa7l.AZ2l  c‘  -  -oao.7613  d'  -  -0.008J 

c"  3  0. 03159631  d  “  =  0.00o35043 

d*  3  -0 . 000ooo28904 1 

Q ( X  +v )  =  22121 1 1 y 3  +  214. 2963y2  +  0.09478893y  +  0.000000289041 

QBC:  X+y  3  0.01109693006  ,  0.01105309791  £  0. 000200903481 4  1 

X  is  not  critical,  nor  is  a  snail  rounding  error  in  d*  .  Here  is  the 
previous  example  repeated  with  a  different  x  :  =  0.01107  : 
a  3  2212111  b  3  24483  c  =  271  d  =  3 

b'  3  -oooo5. 06877  c'  =  -aaa.o2681  d'  =  a.aaooj 

c‘  3  0.0293012839  d ■  3  0.0003267867 

d*  3  0. 000oo2421 4872. . 

Yet  QBC  delivers  practically  the  sane  final  results  x+y  as  before. 

Q (x )  3  61 1  lx3  -  5 1 792x *  +  109737x  +  0.00623 

QBC:  x  3  -5.677209907.o-8  ,  4.237 477594  ,  4.23773//05 

(bc-ad) / <2<ba-ac) )  3  /(  (c*-bd ) / (ba-ac 1  )  3  4.237604349  X  1 3  4.24 

a  3  6111  b  3  17264  c  3  36579  0=0  S  3  d  3  -0.00623 

b'  3  -08646.64  c'  3  -36620.36  d'  3  -155094.96 

C  3  00041.3936  d “  3  000175.3664 

d*  3  -000.148694 

Q (X+y)  3  61 1 1 y3  +  25939. 92y*  +  124.1 808y  +  0.148694 

QBC:  X+y  3  -o.ooooooo56  ,  4.237583786  ,  4.237624911 

d  is  so  tiny  that  the  isolated  root  is  best  calculated  directly  from  Q(x). 


The  foregoing  discussion  may  promote  a  misleading  impression  that 
preconditioning  is  worth  while  only  if  the  data  (coef f i ci ents)  are 
given  exactly.  Other  ci rcurnstances  do  exist  when  preconditioning 
helps,  however.  For  example,  the  errors  in  the  data  could  be 
correlated  in  a  way  that  is  known  to  mostly  cancel  in  the  results. 
Or  the  coefficients,  though  uncorrel atedl y  erroneous,  may  figure 
subsequently  in  several  related  contexts  among  which  consistency 
of  some  kind  is  essential  even  though  ultimate  accuracy  is  not. 

For  instance,  suppose  a  program  uses  the  zeros  of  the  cubic  and 
also  of  its  derivative;  Rolle's  theorem  implies  that  the  latter 
zeros  should  lie  between  the  farmer  when  they  are  all  real,  and  a 
theorem  due  to  Gauss  places  the  latter  inside  the  convex  hull  of 
the  former  when  they  are  complex.  If  those  rel ati onshi ps  are 
violated  by  clustered  approximate  zeros  computed  too  inaccurately, 
the  subsequent  logic  of  the  program  could  malfunction.  Adapting 
that  logic  to  disordered  zeros  can  be  far  more  complicated  than 
precondi ti oni ng  in  a  way  that  protects  their  order  from  roundoff. 
However,  preconditioning  procedures  appropriate  for  noninteger 
data  go  far  beyond  the  scope  of  these  notes. 
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8.  Scaling  Invariance  vs.  Over/Underf  1  ow: 

The  factored  form  of  the  cubic 

Ax3  ■+■  B  xa  +  C  x  +  D  =  A  <x  -  X)  (;•:  -  Y)  (x  -  Z) 
provides  a  f actor i zati on  for  the  scaled  cubic 

( ir  A )  x 3  +  (crBp)x*  +  (grCp2)x  +  <*-Dp3>  =  ,J-A  (x  -  pX>  ( x  -  pY>  (;<  -  pZ)  . 

If  the  scale  factors  <r  and  p  are  powers  of  the  radix  (  10  for 
a  decimal  calculator,  2  for  a  binary  computer),  then  the  scaled 
coefficients  >rA,  <rBp  ,  <rCp2,  ^-Dp3  will  have  the  same  significant 
digits  as  the  original  coefficients  A,  B,  C,  D  ;  only  the 
decimal  or  binary  points  will  have  shifted.  Therefore  the  same 
should  be  true  of  the  scaled  zeros  pX,  pY,  pZ  ,  even  in  the  face 
of  roundoff.  Of  course,  the  relationship  between  the  scaled 
zeros  and  the  original  zeros  X,  Y,  Z  must  break  down  when  the 
scale  factors  are  so  big  or  so  tiny  that  the  scaled  coefficients 
or  zeros  over/underf 1 ow;  ideally  the  relationship  should  not 
break  down  for  any  other  reason.  In  practice,  most  algorithms 
are  vulnerable  to  spurious  over/underf low.  For  instance,  the 
discriminant  q  in  QDRTC  and  the  quotients  r  and  t  in  QBC 
can  easily  over/underf low  even  though  the  coefficients  and  zeros 
lie  well  within  range.  Consci ent i ous  programmers  introduce  scale 
factors  into  their  programs  either  to  f orestal 1  undeserved  over/ 
underflows  or  to  recover  from  them.  The  task  is  not  eased  by  the 
absence  from  most  programming  languages  of  any  reference  to  over/ 
underflow  other  than  an  implication  that  the  crime  will  be 
punished  by  termination  of  the  program's  execution. 

Here  is  how  a  scale  factor  <r  can  be  chosen  to  prevent  spurious 
over/underf low  during  the  solution  of  a  quadratic  equation 
Ax2  +  Bx  +  C  38  0  .  If  ■  A  a  0  or  C  =  0.  the  solution  i.s  obvious. 
Otherwise  choose  <r  to  be  a  power  of  the  radix  near  VI A  |  VI  Cl  , 
and  so  chosen  that  neither  A/>r  nor  C / -r  can  over/underf  1  ow.  Then 
I  (A/c r)  (C / <r)  |  cannot  be  orders  of  magnitude  larger  or  smaller  than 
1  .  Next  compare  |B|  with  <r  ;  if  I B I  is  so  much  bigger  than 
<r  that  IBI  +  <r  rounds  to  |B|  ,  then  the  quadratic's  roots  are 

approximated  accurately  enough  by  -C/B  and  -B/A  .  Otherwise 
call  QDRTC(A/<r,  B/<r,  C/<r,  X,  +  tY,  ,  Xa  +  eYa)  ,  allowing 
underflows  to  flush  to  0  if  nothing  better  is  available.  No 
undeserved  overflow  will  occur. 

Similar  ideas  can  help  suppress  spurious  over/underf lows  when 
solving  the  cubic.  Roughly  speaking,  when  A/B  is  very  tiny, 

much  tinier  than  roundoff  in  numbers  near  1  ,  but  B/C  is  not 

tiny  at  all,  then  the  cubic's  biggest  zero  must  be  very  nearly 
-B/A  ,  and  the  other  zeros  can  be  found  by  setting  A  :=  0  and 

solving  the  resulting  quadratic  equation.  And  when  D/C  is  very 

tiny  but  C/B  is  not,  the  tiniest  zero  is  very  nearly  -D/C  , 
and  so  on.  When  neither  A/B  nor  D/C  is  very  tiny,  the  cubic 
and  its  zeros  can  be  scaled  and  computed  in  the  ordinary  way. 
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9.  Some  Trial  Data  -for  Cubic  Equation  Solvers: 

Notation  : 

Coe-f  +' i ci ents  A,  B,  C,  D  o-f  cubic  Ax3  +  Bx*  +  Cx  +  D  are  input. 
Output  are  real  zeros  X,  Z, ,  Za  or  complex  zeros  Z  . 

Parameters:  M  is  a  small  integer;  N  is  a  big  integer;  usually 

IN |  is  almost  as  big  as  possible  without  roundo-f-f. 
u  1=  M/N  ;  v  1=  1 / ( 2N )  . 

t  is  a  tiny  number;  1000  +  t  rounds  to  1000  . 

h  is  a  huge  number;  h  +  1  rounds  to  h  . 

Follow  the  foriulas  for  coefficients  EXACTLY;  rounding  thee  could  change  zeros  drastically. 

Cubics  with  small  integer  coefficients: 

A  =  1  ,  B  =  D  =  -6  ,  C  =  1  1  X  =  3  ,  Z,  =  1,  Z2  =  2  . 

A  =  D  =  1  ,  B  =  C  =  0  .  X  =  -1  ,  Z  =  0.5+  (VO. 75  . 

A  =  -D  =  1  ,  B  =  C  =  0  .  X  =  1  ,  Z  =  -0.5  +  (VO. 75  . 

A  =  0  ,  B  =  1  ,  C  =  3  ,  D  =  2  .  X  =  00  ,  Z»  =  -1  ,  Z2  =  -2  . 

A  =  1  ,  B  =  -3  ,  C  =  2  ,  D  =  0  .  X  =  0  ,  Z,  =  1  ,  Z2  =  2  . 

A  =  D  =  1  ,  8  =  0  =  3.  X  =  Z,  *  Z*  =  -1  . 

A  =  -B  =  -C  =  D  =  1  .  X  =  -1  ,  Z,  =  Z2  =  1  . 

A= 1 ,  B=-30,  0=299 ,  D=-1980  .  X  =  20  ,  Z  =  5  +  z/74  . 

Cubics  with  zeros  o-f  very  di-f-ferent  magnitudes: 

A=1 ,  B=-30,  0=299,  D=-t  .  X  9  t/299  ,  Z  9  15  +  c/74  . 

A  =  -D  =  t  ,  -B  =  C  =  h  .  X  =  1  ,  Z,  9  t/h  ,  Z2  9  h/t  . 

A  =  1  ,  B  =  -h  ,  C  =  -t  ,  D  =  ht  .  X  =  h  ,  Z  =  +/t  . 

A  =  D  =  1  ,  B  =  C=  1-N-l/N  .  X  =  1/N  ,  Z,  =  -1  ,  Z2  =  N. 

Cubics  with  ill-conditioned  zeros: 

A  =  -c  =  N+l  ,  D  =  -B  =  N-l  .  -X  =  Z,  =  1  ,  Za  =  1  -  2/CN+l)  . 

A  =  -D  =  N  ,  0  =  -B  =  3N+2M  .  X  =  1  ,  Z  =  1 +u  +  /(2u+ua)  . 

A=B=3C,  C  =  9N3 ,  D  =  1 -N3  .  X  =  <l-2v>/3  ,  Z  =  1+v  +  tv/3  . 
A=D=  Na+Ma,  B=C=  3Ma-Na.  X=-l ,  Z  =  l-2ua/(t+ua)  +  2zu/<l+ua>. 

A=-D=Na+Ma ,  -B=C=3Na— Ma  .  X=1 ,  Z  =  l-2ua/Cl+ua)  +  2zu/<l+ua>  . 


10.  Selected  Results  -from  the  HP-15C  : 

Both  algorithas  above,  one  using  the  foriula  with  coeplei  arcsin,  one  like  BBC  that  iterates  to 
solve  a  cubic,  have  been  prograeoed  into  the  HP- 1 5C  calculator  along  with  a  prograe  like  REVAL  to 
coaputc  fl(x)  and  A  .  The  results  tabulated  belou  show  the  coefficients  A,  B,  C,  D  of  the  cubic  Six) 
and  the  zeros  X,  Y,  Z  obtained  first  froe  the  programed  Foraula,  second  froe  exact  calculation  on 
another  aachine,  third  fron  the  iterative  icthod  BBC.  Below  BBC's  results  are  shown  corresponding 
quotients  |0(X)|/A(X),  |B(Y>|/A(Y),  16(1)1/6(2)  as  coaputed  by  a  prograe  like  REVAL  . 

The  HP-13C  rounds  arithietic  to  10  sig.  dec.,  corresponding  to  I  *  c  *  3e-10  . 


A  »  1 

Foriula 

X  »  | 

B  a  -6 

Correct 

X  »  1 

C  a  11 

Iter've 

X  »  1 

0  a  -6 

101/4 

0 

Y  »  2  1  *  3 

Y  «  2  Z  »  3 

Y  «  2  Z  *  I 

0  0 


A  a  -| 

Foriula 

X  =  -I 

B  a  0 

Correct 

X  »  -l 

C  »  0 

Iter've 

X  »  -1 

D  a  -1 

101/A 

0 

Y  *  0.49WWW  ♦  0.8660254037  t 

Y  *  0.3  +  0.B66O234O3784  t 

Y  *  0.3  ♦  0,8660234038  i 
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A  -  1 
B  3  0 
C  3  -2 
D  *  *5 


Foreula  X  =  -!  *  Y  *  Z 

Correct  X  3  -1  =  Y  5  Z 

Iter've  X  3  -I  3  Y  3  Z 

181/A  0  ...  ,  so  the  programs  work  at  least  soeetiees. 


Foreula  II  3  2.094551481 
Correct  X  3  2.0945514815 
Iter've  X  3  2.094551481 
IOI/A  0.27 


Y  3  -1.04727574!  ♦  1.135939889  i 

Y  3  -1.0472757408  *  1.1359398891  ! 

Y  3  -1.047275741  +  1.135939889  t 

0.15 


leu ton 's 


exaeple. 


A  3  I 
B  3  -3 
C  3  2 
D  3  0 


For  ml  a  X  3  4e-10 
Correct  X  3  0 
Iter've  X  3  0 
181/A  0 


is  oeinous. 


A  3  1 
B  3  -3 
C  3  2 

D  3  2. 34e-89 


Foreula  X  3  4e-10 
Correct  X  3  -1.17e-89 
Iter've  X  3  -1.17e-89 
181/A  0 


Y  3  1 

Y  3  1 

Y  3  1 

9.4e-81 


X  is  wrong. 

This  is  why  3BC  has 
1.000, .001  in  it. 


2.9e-81 


A  3  1 

B  3  -7999999999 
C  3  -8400000002 
0  3  16000000000 

A  3  16000000000 
8  3  -8000000002 
C  3  -7999999999 
0  3  1 

A  3  1 

B  3  -99999.00001 
C  3  -99999.0000! 
D  3  I 

A  3  0.01 
B  3  -300 
C  3  2990000 
D  3  -299 


Foreula  X  3  7999999998 
Correct  X  3  8000000000 
Iter've  X  3  8000000000 
|8|/A  0 

Foreula  X  3  1 . Oe- 1 0 
Correct  X  3  1.25e-10 
Iter've  X  3  1.25e-10 
181/A  0 


Y  3  43545 

Y  3  1 

Y  3  1 

0 


Z  3  -43545 
Z  3  -2 
Z  3  -2 
0 


Y  3  0.9999999999  Z  3  -0.4999999999 

Y  3  1  Z  3  -0.5 

Y  3  1  Z  3  -0.5 

0  0.1 


Y  and  Z  are  very  wrong. 


.,  Y  and  Z  are  0.  K. 

now,  but  X  isn't. 
This  cubic  is  the 
previous  one  reversed. 


Foreula  X  3  99999.99997 
Correct  X  3  100000 
Iter've  X  3  100000 
181/A  0 


Y  3  -1.0443  Z  3  0.04433  ...  Y  and  Z  are  wrong  again, 

Y  3  -1  Z  3  0.00001  and  reversing  the  cubic 

Y  3  -1  Z  3  0.00001  won't  ieprove  Y  . 

0.1  0 


Foreula  X  3  1. Ole-4  -  2e-6  i 
Correct  X  3  1. 0000000 1003e-4 
Iter've  X  3  1.0000000 t0e-4 
181/A  0 


Y  3  14999.99995  ♦  8602.325194  ( 

Y  3  14999.99995  ♦  8602.32517986  i 

Y  3  14999.99995  ♦  8602.32518  i 

0.00015 


The  Foreula’ s  value  X  is  wrong  in  the  worst  way:  wrong  enough  to  eatter,  but  not  obviously  wrong. 

A  3  -3  Foreula  X  3  0.3333333333  3  Y  3  Z 

B  3  3  Correct  X  3  0.333178613706  Y  3  0.333410693147  +  0. 0001 33991 12B1 29  i 

C  3  -I  Iter've  X  3  0.3333333333  Y  3  0.3333366667  Z  3  0.33333 

D  3  0.1111111111  181/A  0  0  0 

Better  results  cannot  be  expected  Froe  calculations  carried  out  to  10  sig.  dec.,  since  as  eany  as  two 
thirds  of  the  Figures  carried  can  be  lost  iF  all  three  zeros  are  nearly  coincident. 


A  3  10000000000  Foreula  X  3  -0.9999999997  Y  3  0.9999999998  ♦  0.00001721325932  : 

B  3  -9999999998  Correct  X  3  -I  Y  3  1  Z  3  0.9999999998 
C  3  -A  Iter've  X  3  -I  Y  3  1.000014142  Z  3  0.999985858 
0  3  -B  181/A  0.057  0.27  0 

Better  results  cannot  be  expected  Froe  calculations  carried  out  to  10  sig.  dec.,  since  as  eany  as  halF 
the  Figures  carried  can  be  lost  iF  two  zeros  are  nearly  coincident  but  Far  Froe  the  third.  Note  that  any 
prograe  that  coeputes  X,  Y,  Z  as  well  as  can  be  expected  should  produce  values  For  [Of/A  sealler  than 
1  or  2  ,  but  the  seal  1  ness  oF  that  quotient  does  not  by  itselF  tell  how  accurate  a  coeputed  zero  eay  be. 
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11.  A  Program  for  the  HP-15C: 

This  program  deals  with  rubies  Q<x)  =  ax3  +  bxa  +  c:<  +  d  and 
quadratics  rxa  +  sx  +  t  .  Function  keys  E  A3,  EBJ ,  CC3  ,  EDI.,  EEJ 
and  Cl  I  are  invoked  via  CGSB3  CA3  etc.  Stack  register  X  is 
normally  displayed;  to  see  the  other  registers  Y,  Z  and  T  ,  use 
the  CR'kJ,  CRf J  or  CX£Y]  keys.  Here  is  what  the  program  does: 

CAT:  a  CENTER J  b  CENTER J  c  CENTER]  d  C A3  stores  a,  b,  c,  d 

in  cells  #3,  #4,  #5,  #6  resp.  -for  ... 

CB3s  Using  coefficients  a,  b,  c,  d  stored  by  CA3  ,  solves 

Q ( x )  =0  for  roots  X,  Y,  Z  by  means  of  a 

formula  involving  complex  arcsin.  Scratches 
cells  #7,  #8,  #9  . 

EE]:  Using  coefficients  a,  b,  c,  d  stored  by  CA3  ,  solves 

Q(x)  =0  for  roots  X",  Y,  Z  as  QBC  does  by 

iteration  and  deflation.  X  is  real  and  also 
in  cell  #9  ;  Y  and  Z  are  comp 1  ex - 
conjugates.  Scratches  cells  #7  and  #8  . 

CC3:  Using  coefficients  a,  b,  c,  d  stored  by  CA3  ,  copies 

X  into  Z  and  T  ,  writes  I X I  into  cell  #7, 

writes  Q(X')  over  X  and  an  error  bound  for 
Q(X)  onto  Y  .  X  may  be  complex.  Cf .  REVAL,. 
Cl]:  Using  coefficients  a,  b,  c,  d  stored  by  CA3  ,  writes 

X  into  Z  and  T  ,  Q’ (X)  into  V'  and  Q(X> 

over  X  ;  and  if  X  was  real,  then  leaves 

aX+b  in  cell  #7  ,  (aX+b)X+c  i n  #8  . 

CD]:  r  CENTER]  s  CENTER]  t  CD]  solves  a  quadratic  equation 

rxa  +  sx  +  t  =  0  for  its  roots  X  and  Y  , 
which  may  be  complex.  r  ^  0  .  Cf .  QDRTC, 

Program  Text: 

LBLCA3  ST06  Rt  ST03  Rt  STQ4  Rt  ST03  RTN  LBLCB]  CFB  RCL4  RCL3  1=0?  GT09  f  3  CHS  f  ST07  P  ST08  3  x  RCL5 
RCLf3  -  ST09  RCL-8  RCLx7  RCL&  RCLf3  -  Ra9  1=0?  STOO  f  3  *  RCL9  .73  f  SF8  ft  f  LSTX  X<>Y  SIM"*  3  4  CFO  X<0?  SFO 
*  3  f  FO?  CHS  W  -  LSTX  SIN  Rt  :  I*Y  SIN  Rt  x  ST02  LBLO  W  CHS  3  1/X  SFB  Y*  ENTER  ENTER  I  CHS  LSTX  Y*  CHS 
x  LBL2  Ra7  X£Y  -  RCL7  LSTX  Rt  ♦  Ra+7  X* Y  LSTX  -  RTN  LBLCC]  ASS  ST07  4.006  STOI  LSTX  RCL3  ENTER  ABS  2 
f  LBL3  RCLx7  Rt  x  RCLHi)  ENTER  ABS  Rt  ♦  IS6I  ST03  LSTX  +  2  EEX  9  4  X$Y  RTN  LBLCD3  CFB  X<Y  2  CHS  f  CFO 
X<0?  SFO  STOI  X*  Rt  x  LSTX  UY  Rt  -  CHS  X>0?  BT04  CHS  /X  Rt  f  ENTER  CHS  RCLI  LSTX  f  X^Y  fl  ENTER  Rt  fl  RTN 
LBL4  ft  FO?  CHS  RCL+I  X=0?  RTN  4  LSTX  Rt  4  RTN  LBLCEI  CFB  EEX  CHS  9  ex  ST09  Ra6  X=0?  ST06  RCL4  RCL3 
X=0?  6T09  4  3  CHS  4  SS8I  RCL43  CFO  X<0?  SFO  ABS  3  i/X  Y*  X<>Y  RCL43  CHS  X<0?  ST03  ft  X>Y?  X*Y  CLX  1.325  x  ENTER 
LBLS  aX  +  FO?  CHS  -  X=Y?  BT07  LBL6  SSB1  X*Y  X=0?  6T07  4  RCL49  -  X=Y?  6T07  LSTX  FO?  CHS  X>0?  GTC7  Rt  GT06 
LBLC  1  ]  ENTER  ENTER  RCLx3  ENTER  RCL+4  ST07  ♦  x  X|Y  RCLx7  RCL+5  STOB  ♦  X*Y  LSTX  x  RCL+6  RTN  LBL7  Rt  Rt 
ST09  X=0?  STQ8  Xa  RCLx3  ABS  RCL6  Ra49  ABS  X>Y?  BT08  LSTX  CHS  ST08  RCL-5  Ra49  ST07  LBLS  RCL3  RCL7  RCL8  ST05 
LBL9  I  TANH- 1  ST09  RCL4  Ra5  RCLA  LBLS  6SBD  Ra?  RTN  (  303  steps  ) 


12.  Program  Timings: 

For  the  selected  results  from  the  HP-15C  exhibited  above,  the 
closed-form  formula  program  CB]  took  about  14  sec.  on  average; 
the  iterative  QBC  program  CE]  averaged  roughly  27  sec.  But 
program  CB3  was  inaccurate  at  times;  to  get  results  as  reliable 
as  CE]'s  ,  program  CB3  would  have  to  be  run  twice,  the  second 
time  with  coefficients  reversed,  and  then  the  two  sets  of  results 
would  have  to  be  combined  with  some  additional  arithmetic.  Thus, 
the  iterative  program  runs  faster  on  the  HP--15C  than  would  a 
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reliable  program  based  upon  closed— form  -formulas  despite  that  the 
complex  inverse  trigonometric  -functions  available  on  that  machine, 
but  on  -few  others,  promote  the  i  mpl  ementati  on  o-f  tne  formulas. 


13.  Annotated  Bibliography: 

An  old  encyclopaedi a  like  the  Britannica  is  as  good  a  place  as 
any  to  look  up  the  Italians  Scipione  Ferro,  Tartaglia  (Niccolo 
Fontana)  and  Hieronimo  Cardano,  and  the  Frenchman  Franciscus 
Vi  eta,  who  -first  produced  closed-form  solutions  for  the  cubic 
equation.  Their  formulas  can  be  found  there  too  under  the  heading 
"Equations,  Theory  of";  or  in  handbooks  like  the  Handbook  of 
Chemistry  and  Physics ,  the  Chemical  Rubber  Publishing  Co., 
Cleveland;  or  the  Handbook  of  Mathematical  Functions  edited  by 
li.  Abramowitz  and  Irene  Stegun,  #55  in  the  Applied  Mathematics 
Series  published  in  1964  by  the  U.  S.  National  Bureau  of  Standards 
but  obtainable  now  reprinted  by  Dover,  N.  Y.  The  algorithm  QBC 
presented  in  §3  and  §4  has  not  been  published  before. 

The  genesis  of  rounding  errors  on  older  electronic  computers  is 
described  well  by  Patrick  H.  Sterbenz  in  his  book  Floating- 
Point  Computation ,  published  in  1974  by  Prenti ce-Hal 1 ,  N.  J. 

A  better  arithmetic  design  is  specified  by  the  IEEE  standards 
754-1985  and  p854,  to  which  many  of  the  newest  computers  conform; 
these  standards  have  been  described  by  W.  J.  Cody  et  al .  in  "A 
Proposed  Radix-  and  Ward-length-independent  Standard  for  Floating- 
Point  Arithmetic"  in  IEEE  MICRO ,  August  1984,  pp.  86  -  100. 

An  elementary  overview  of  error  analysis  is  provided  in  parts  of 
the  HP-15C  Advanced  Functions  Handbook ,  Hewl ett-Packard  part  no. 
00015-9011,  1982.  Backward  error  analysis  in  particular  is  the 
subject  of  Rounding  Errors  in  Algebraic  Processes  by  James  H. 
Wilkinson,  Prentice-Hal 1 ,  1963.  The  error  analysis  summarized  in 
§5  has  not  been  published  yet;  its  approach  is  similar  to  that 
in  Brian  T.  Smith's  "Error  Bounds  for  Zeros  of  a  Polynomial 
••  Based  Upon  Gerschgorin’s  Theorem"  in  the  Journal  of  the  ACM 
vol .  17  (1970),  pp.  661-674,  wherein  may  be  found  also  the  proof 
of  the  claims  for  the  three  inequalities  near  the  end  of  §6  . 

96's  procedure  REVAL  is  similar  to  one  presented  and  explained 
in  "A  stopping  criterion  for  polynomial  root-finding"  by  Duane 
A.  Adams,  Communications  of  the  ACM  vol.  10  (1967),  pp.  655-653, 
The  preconditioning  techniques  in  §7  and  the  scaling  techniques 
in  §8  are  new  although  similar  in  spirit  to  techniaues  described 
in  the  author’s  lecture  notes  since  1963.  The  theorem  by  Gauss 
that  relates  the  zeros  of  a  polynomial  and  of  its  derivative,  and 
Laguerre’s  theorem  mentioned  in  §6  ,  can  both  be  found  in 
Geometry  of  Zeros  by  M.  Marden  (1966),  American  Mathematics 
Society,  Providence,  R.  I. 
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